A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling

نویسندگان

  • Muhammad Atif Tahir
  • Josef Kittler
  • Krystian Mikolajczyk
  • Fei Yan
چکیده

In this paper, a novel inverse random under sampling (IRUS) method is proposed for class imbalance problem. The main idea is to severely under sample the negative class (majority class), thus creating a large number of distinct negative training sets. For each training set we then find a linear discriminant which separates the positive class from the negative class. By combining the multiple designs through voting, we construct a composite between the positive class and the negative class. The proposed methodology is applied on 11 UCI data sets and experimental results indicate a significant increase in Area Under Curve (AUC) when compared with many existing class-imbalance learning methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Customer churn prediction using improved balanced random forests

Churn prediction is becoming a major focus of banks in China who wish to retain customers by satisfying their needs under resource constraints. In churn prediction, an important yet challenging problem is the imbalance in the data distribution. In this paper, we propose a novel learning method, called improved balanced random forests (IBRF), and demonstrate its application to churn prediction. ...

متن کامل

Horvitz-Thompson estimator of population mean under inverse sampling designs

Inverse sampling design is generally considered to be appropriate technique when the population is divided into two subpopulations, one of which contains only few units. In this paper, we derive the Horvitz-Thompson estimator for the population mean under inverse sampling designs, where subpopulation sizes are known. We then introduce an alternative unbiased estimator, corresponding to post-st...

متن کامل

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...

متن کامل

Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem

Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009